Overview

Dataset statistics

Number of variables8
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory48.1 KiB
Average record size in memory64.2 B

Variable types

Numeric7
Categorical1

Alerts

SkinThickness is highly overall correlated with InsulinHigh correlation
Insulin is highly overall correlated with SkinThicknessHigh correlation
BloodPressure is highly overall correlated with BMIHigh correlation
BMI is highly overall correlated with BloodPressureHigh correlation
BloodPressure has 35 (4.6%) zerosZeros
SkinThickness has 227 (29.6%) zerosZeros
Insulin has 374 (48.7%) zerosZeros
BMI has 11 (1.4%) zerosZeros

Reproduction

Analysis started2022-11-30 03:40:39.575666
Analysis finished2022-11-30 03:40:54.385014
Duration14.81 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

Glucose
Real number (ℝ)

Distinct136
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121.11719
Minimum0
Maximum199
Zeros5
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:54.530157image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile79
Q199
median117
Q3142
95-th percentile180.65
Maximum199
Range199
Interquartile range (IQR)43

Descriptive statistics

Standard deviation31.805091
Coefficient of variation (CV)0.26259767
Kurtosis0.63709916
Mean121.11719
Median Absolute Deviation (MAD)20
Skewness0.13293448
Sum93018
Variance1011.5638
MonotonicityNot monotonic
2022-11-30T09:10:54.615414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 17
 
2.2%
99 17
 
2.2%
150 15
 
2.0%
106 14
 
1.8%
129 14
 
1.8%
111 14
 
1.8%
125 13
 
1.7%
105 13
 
1.7%
112 13
 
1.7%
108 13
 
1.7%
Other values (126) 625
81.4%
ValueCountFrequency (%)
0 5
0.7%
44 1
 
0.1%
56 1
 
0.1%
57 2
 
0.3%
61 1
 
0.1%
62 1
 
0.1%
65 1
 
0.1%
67 1
 
0.1%
68 3
0.4%
71 4
0.5%
ValueCountFrequency (%)
199 1
 
0.1%
198 1
 
0.1%
197 3
0.4%
196 3
0.4%
195 2
0.3%
194 3
0.4%
193 2
0.3%
191 1
 
0.1%
190 1
 
0.1%
189 3
0.4%

BloodPressure
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct47
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.076823
Minimum0
Maximum122
Zeros35
Zeros (%)4.6%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:54.703430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile38.7
Q162
median72
Q380
95-th percentile90
Maximum122
Range122
Interquartile range (IQR)18

Descriptive statistics

Standard deviation19.367794
Coefficient of variation (CV)0.28038049
Kurtosis5.1502014
Mean69.076823
Median Absolute Deviation (MAD)8
Skewness-1.8369976
Sum53051
Variance375.11143
MonotonicityNot monotonic
2022-11-30T09:10:54.781510image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
70 57
 
7.4%
74 52
 
6.8%
78 45
 
5.9%
68 45
 
5.9%
64 43
 
5.6%
72 43
 
5.6%
80 40
 
5.2%
76 39
 
5.1%
60 37
 
4.8%
0 35
 
4.6%
Other values (37) 332
43.2%
ValueCountFrequency (%)
0 35
4.6%
24 1
 
0.1%
30 2
 
0.3%
38 1
 
0.1%
40 1
 
0.1%
44 4
 
0.5%
46 2
 
0.3%
48 5
 
0.7%
50 14
 
1.8%
52 11
 
1.4%
ValueCountFrequency (%)
122 1
 
0.1%
114 1
 
0.1%
110 3
0.4%
108 2
0.3%
106 3
0.4%
104 2
0.3%
102 1
 
0.1%
100 3
0.4%
98 3
0.4%
96 4
0.5%

SkinThickness
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct51
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.536458
Minimum0
Maximum99
Zeros227
Zeros (%)29.6%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:54.863354image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median23
Q332
95-th percentile44
Maximum99
Range99
Interquartile range (IQR)32

Descriptive statistics

Standard deviation15.952218
Coefficient of variation (CV)0.77677549
Kurtosis-0.52007187
Mean20.536458
Median Absolute Deviation (MAD)12
Skewness0.1093725
Sum15772
Variance254.47325
MonotonicityNot monotonic
2022-11-30T09:10:54.931353image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 227
29.6%
32 31
 
4.0%
30 27
 
3.5%
27 23
 
3.0%
23 22
 
2.9%
33 20
 
2.6%
28 20
 
2.6%
18 20
 
2.6%
31 19
 
2.5%
19 18
 
2.3%
Other values (41) 341
44.4%
ValueCountFrequency (%)
0 227
29.6%
7 2
 
0.3%
8 2
 
0.3%
10 5
 
0.7%
11 6
 
0.8%
12 7
 
0.9%
13 11
 
1.4%
14 6
 
0.8%
15 14
 
1.8%
16 6
 
0.8%
ValueCountFrequency (%)
99 1
 
0.1%
63 1
 
0.1%
60 1
 
0.1%
56 1
 
0.1%
54 2
0.3%
52 2
0.3%
51 1
 
0.1%
50 3
0.4%
49 3
0.4%
48 4
0.5%

Insulin
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct186
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.799479
Minimum0
Maximum846
Zeros374
Zeros (%)48.7%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:55.032756image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median30.5
Q3127.25
95-th percentile293
Maximum846
Range846
Interquartile range (IQR)127.25

Descriptive statistics

Standard deviation115.244
Coefficient of variation (CV)1.4441699
Kurtosis7.2142596
Mean79.799479
Median Absolute Deviation (MAD)30.5
Skewness2.2722509
Sum61286
Variance13281.18
MonotonicityNot monotonic
2022-11-30T09:10:55.115779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 374
48.7%
105 11
 
1.4%
130 9
 
1.2%
140 9
 
1.2%
120 8
 
1.0%
94 7
 
0.9%
180 7
 
0.9%
100 7
 
0.9%
135 6
 
0.8%
115 6
 
0.8%
Other values (176) 324
42.2%
ValueCountFrequency (%)
0 374
48.7%
14 1
 
0.1%
15 1
 
0.1%
16 1
 
0.1%
18 2
 
0.3%
22 1
 
0.1%
23 2
 
0.3%
25 1
 
0.1%
29 1
 
0.1%
32 1
 
0.1%
ValueCountFrequency (%)
846 1
0.1%
744 1
0.1%
680 1
0.1%
600 1
0.1%
579 1
0.1%
545 1
0.1%
543 1
0.1%
540 1
0.1%
510 1
0.1%
495 2
0.3%

BMI
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct248
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.992578
Minimum0
Maximum67.1
Zeros11
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:55.184073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21.8
Q127.3
median32
Q336.6
95-th percentile44.395
Maximum67.1
Range67.1
Interquartile range (IQR)9.3

Descriptive statistics

Standard deviation7.8841603
Coefficient of variation (CV)0.24643717
Kurtosis3.2904429
Mean31.992578
Median Absolute Deviation (MAD)4.6
Skewness-0.42898159
Sum24570.3
Variance62.159984
MonotonicityNot monotonic
2022-11-30T09:10:55.247233image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32 13
 
1.7%
31.6 12
 
1.6%
31.2 12
 
1.6%
0 11
 
1.4%
32.4 10
 
1.3%
33.3 10
 
1.3%
30.1 9
 
1.2%
32.8 9
 
1.2%
32.9 9
 
1.2%
30.8 9
 
1.2%
Other values (238) 664
86.5%
ValueCountFrequency (%)
0 11
1.4%
18.2 3
 
0.4%
18.4 1
 
0.1%
19.1 1
 
0.1%
19.3 1
 
0.1%
19.4 1
 
0.1%
19.5 2
 
0.3%
19.6 3
 
0.4%
19.9 1
 
0.1%
20 1
 
0.1%
ValueCountFrequency (%)
67.1 1
0.1%
59.4 1
0.1%
57.3 1
0.1%
55 1
0.1%
53.2 1
0.1%
52.9 1
0.1%
52.3 2
0.3%
50 1
0.1%
49.7 1
0.1%
49.6 1
0.1%

DiabetesPedigreeFunction
Real number (ℝ)

Distinct517
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:55.342988image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.3313286
Coefficient of variation (CV)0.70215138
Kurtosis5.5949535
Mean0.4718763
Median Absolute Deviation (MAD)0.1675
Skewness1.9199111
Sum362.401
Variance0.10977864
MonotonicityNot monotonic
2022-11-30T09:10:55.436183image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.258 6
 
0.8%
0.254 6
 
0.8%
0.268 5
 
0.7%
0.207 5
 
0.7%
0.261 5
 
0.7%
0.259 5
 
0.7%
0.238 5
 
0.7%
0.19 4
 
0.5%
0.263 4
 
0.5%
0.299 4
 
0.5%
Other values (507) 719
93.6%
ValueCountFrequency (%)
0.078 1
0.1%
0.084 1
0.1%
0.085 2
0.3%
0.088 2
0.3%
0.089 1
0.1%
0.092 1
0.1%
0.096 1
0.1%
0.1 1
0.1%
0.101 1
0.1%
0.102 1
0.1%
ValueCountFrequency (%)
2.42 1
0.1%
2.329 1
0.1%
2.288 1
0.1%
2.137 1
0.1%
1.893 1
0.1%
1.781 1
0.1%
1.731 1
0.1%
1.699 1
0.1%
1.698 1
0.1%
1.6 1
0.1%

Age
Real number (ℝ)

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.24349
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-11-30T09:10:55.519700image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.758182
Coefficient of variation (CV)0.35369879
Kurtosis0.64273721
Mean33.24349
Median Absolute Deviation (MAD)7
Skewness1.1286322
Sum25531
Variance138.25485
MonotonicityNot monotonic
2022-11-30T09:10:55.580682image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 72
 
9.4%
21 63
 
8.2%
25 48
 
6.2%
24 46
 
6.0%
23 38
 
4.9%
28 35
 
4.6%
26 33
 
4.3%
27 32
 
4.2%
29 29
 
3.8%
31 24
 
3.1%
Other values (42) 348
45.3%
ValueCountFrequency (%)
21 63
8.2%
22 72
9.4%
23 38
4.9%
24 46
6.0%
25 48
6.2%
26 33
4.3%
27 32
4.2%
28 35
4.6%
29 29
3.8%
30 21
 
2.7%
ValueCountFrequency (%)
81 1
 
0.1%
72 1
 
0.1%
70 1
 
0.1%
69 2
0.3%
68 1
 
0.1%
67 3
0.4%
66 4
0.5%
65 3
0.4%
64 1
 
0.1%
63 4
0.5%

Outcome
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0
500 
1
268 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters768
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Length

2022-11-30T09:10:55.803432image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-30T09:10:55.937615image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring characters

ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 768
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring scripts

ValueCountFrequency (%)
Common 768
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 500
65.1%
1 268
34.9%

Interactions

2022-11-30T09:10:53.692646image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:49.754689image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.759348image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.277899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.784686image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.275102image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.096050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.777091image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:49.888302image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.846704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.360973image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.847526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.348267image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.169853image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.853797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.114584image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.932314image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.434885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.932358image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.477598image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.267733image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.920483image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.330501image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.001393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.512572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.015295image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.556008image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.364978image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.983398image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.475219image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.067465image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.577522image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.087471image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.620938image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.463395image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:54.050982image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.581386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.137333image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.645224image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.150099image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.734905image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.533880image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:54.132623image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:50.677784image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.203898image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:51.713440image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.212893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:52.862966image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-30T09:10:53.613352image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-30T09:10:55.991845image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-30T09:10:56.081205image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-30T09:10:56.184494image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-30T09:10:56.279475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-30T09:10:56.366533image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-30T09:10:54.246533image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-30T09:10:54.322734image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

GlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
01485035033.60.627501
1856629026.60.351310
2183640023.30.672521
315066239428.10.167210
4150403516843.12.288331
5150740025.60.201300
615050328831.00.248261
715000035.30.134290
8150704554330.50.158351
915096000.00.232541
GlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
758106760037.50.197260
759190920035.50.278661
7608858261628.40.766220
7611707431044.00.403431
76289620022.50.142330
763101764818032.90.171630
7641227027036.80.340270
765121722311226.20.245300
766126600030.10.349471
767937031030.40.315230